Automatic Recognition of Linguistic Replacements in Text Series Generated from Keystroke Logs
نویسندگان
چکیده
This paper introduces a toolkit used for the purpose of detecting replacements of different grammatical and semantic structures in ongoing text production logged as a chronological series of computer interaction events (so-called keystroke logs). The specific case we use involves human translations where replacements can be indicative of translator behaviour that leads to specific features of translations that distinguish them from non-translated texts. The toolkit uses a novel CCG chart parser customised so as to recognise grammatical words independently of space and punctuation boundaries. On the basis of the linguistic analysis, structures in different versions of the target text are compared and classified as potential equivalents of the same source text segment by ‘equivalence judges’. In that way, replacements of grammatical and semantic structures can be detected. Beyond the specific task at hand the approach will also be useful for the analysis of other types of spaceless text such as Twitter hashtags and texts in agglutinative or spaceless languages like Finnish or Chinese.
منابع مشابه
Keystroke dynamics as signal for shallow syntactic parsing
Keystroke dynamics have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing. But do keystroke logs contain actual signal that can be used to learn better natural language processing models? We postulate that keystroke dynamics contain information about syntactic structure that can inform shallow syntactic parsing. To test this hypothesis, we...
متن کاملGrounded Language Modeling for Automatic Speech Recognition of Sports Video
Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that grounded language models improve perplexity and word error rate over text based language models, and...
متن کامل{ENTER}ing the Time Series {SPACE}: Uncovering the Writing Process through Keystroke Analyses
This study investigates how and whether information about students’ writing can be recovered from basic behavioral data extracted during their sessions in an intelligent tutoring system for writing. We calculate basic and time-sensitive keystroke indices based on log files of keys pressed during students’ writing sessions. A corpus of prompt-based essays was collected from 126 undergraduates al...
متن کاملHow Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs
This paper presents a comparative study of spelling errors that are corrected as you type, vs. those that remain uncorrected. First, we generate naturally occurring online error correction data by logging users’ keystrokes, and by automatically deriving preand postcorrection strings from them. We then perform an analysis of this data against the errors that remain in the final text as well as a...
متن کاملUtilizing Linguistic Context To Improve Individual and Cohort Identification in Typed Text
Utilizing Linguistic Context To Improve Individual and Cohort Identification in Typed Text BY Adam GOODKIND The process of producing written text is complex and constrained by pressures that range from physical to psychological. In a series of three sets of experiments, this thesis demonstrates the effects of linguistic context on the timing patterns of the production of keystrokes. We elucidat...
متن کامل